You are viewing the RapidMiner Studio documentation for version 10.0 - Check here for latest version
Sample Collection (Operator Toolbox)
Synopsis
A collection is a list of items. This operator allows you to take a collection and sample it to a given sample size.Description
The operator provides 3 different sampling methods (see parameter description) to perform the sampling. The parameter sample_size describes the number of items in the sampled output collection.Input
- exa (Collection)
The collection which should be sampled.
Output
- col (Collection)
The sampled collection.
- org (Collection)
The original collection.
Parameters
- sampling_method
The method to use for sampling.
- linear sampling: Take the first n objects of the collection.
- shuffled sampling: Take n unique, but random objects of the collection.
- bootstrap sampling: Take n random objects of the collection. Objects are allowed to be taken several times.
- sample_size The number of objects to be drawn. Range:
- use_local_random_seed This parameter indicates if a local random seed should be used. Range:
- local_random_seed If the use local random seed parameter is checked this parameter determines the local random seed. Range:
Tutorial Processes
Grouping an ExampleSet into a collection
In this process we group the Titanic data set into bins of passenger fare. Then we select 2 random price ranges.